Interpretability module: sparse linear models via LASSO #120

luigibonati · 2024-01-30T00:55:36Z

Description

Add sparse linear models optimized via LASSO as tools for interpreting the CVs and/or the resulting states, as done here: https://pubs.acs.org/doi/abs/10.1021/acs.jctc.2c00393.

I started from the notebook that @pietronvll and I did. We implemented both the classifier case (as done in stateinterpreter) and also the regression one. A few changes:

I extended the functions to work also for the multi-class case
I changed the scoring function to use the balanced_accuracy_score instead of the standard one in case the datasets are imbalanced.

For both the regression and classification the signature is (almost) the same, with both returning the optimized estimator together with the list of non-zero features and their coefficients. I also did separate functions to plot the results (coefficient paths, score and number of features).

Todos

Notable points that this PR has either accomplished or will accomplish.

Function: lasso_classification (based on sckitlearn.LogisticRegressionCV)
Function: lasso_regression (based on sckitlearn.LassoCV)
Plotting functions
Docstrings
Regtests
Raise error when importing module if scikit-learn is not installed
Add documentation pages
Add scikit-learn dependency to GA

Tutorials

Work in progress

Tutorial: LASSO functions
Tutorial: Stateinterpreter with DeepTICA CVs (porting https://github.com/luigibonati/md-stateinterpreter/blob/main/tutorials/2_hierarchical_classification.ipynb)

Questions

This requires scikit-learn as an additional dependency, which I would keep optional
As of now, I put these functions inside utils.lasso. However, since there is already also the sensitivity analysis contained in utils.explain we might move all these functions into a new module called explain?

Status

Ready to go

mlcolvar/utils/lasso.py

codecov · 2024-01-30T01:08:20Z

Codecov Report

Attention: Patch coverage is 91.90751% with 28 lines in your changes missing coverage. Please review.

Project coverage is 92.50%. Comparing base (3f9adeb) to head (71ad599).

Additional details and impacted files

mlcolvar/explain/lasso.py

+
+import matplotlib 
+import matplotlib.pyplot as plt
+import mlcolvar.utils.plot


mlcolvar/explain/lasso.py

+import mlcolvar.utils.plot
+
+try:
+    import sklearn


mlcolvar/explain/sensitivity.py

@@ -1,8 +1,10 @@
 import numpy as np
 import torch
+from matplotlib import patches as mpatches
+import matplotlib.pyplot as plt
+import mlcolvar.utils.plot


mlcolvar/explain/utils.py

mlcolvar/explain/lasso.py

mlcolvar/tests/test_explain_lasso.py

@@ -0,0 +1,7 @@
+import pytest


mlcolvar/explain/lasso.py

+try:
+    import sklearn
+except ImportError:
+    print('The lasso module requires scikit-learn as additional dependency.')


…olvar into interpretability

mlcolvar/explain/utils.py

mlcolvar/utils/plot.py

+            fig, axs = plt.subplots(n_feat, 1, figsize=(3, 3*n_feat))
+
+        plt.suptitle('Features distribution')
+        init_ax = True


mlcolvar/utils/plot.py

-        ax.set_xlim(0, None)
+        if n_feat != len(axs):
+            raise ValueError(f'Number of features ({len(features)}) != number of axis ({len(axs)})')
+        init_ax = False


mlcolvar/explain/utils.py

luigibonati · 2024-06-11T13:21:01Z

I have put everything into a new explain submodule, containing sensitivity analysis and sparse models

will merge it soon

pietronvll and others added 3 commits January 17, 2024 15:47

Added Lasso notebook

1ccbbce

add lasso module

b345809

uploaded drafts of lasso notebooks

e51871b

luigibonati requested a review from pietronvll January 30, 2024 00:55

github-advanced-security bot found potential problems Jan 30, 2024

View reviewed changes

luigibonati added 2 commits June 5, 2024 18:55

Merge remote-tracking branch 'origin/main' into interpretability

df937a9

create explain submodule

e80ca9c

github-advanced-security bot found potential problems Jun 5, 2024

View reviewed changes

luigibonati added 2 commits June 6, 2024 10:18

fix tests and warnings

5489c4b

add lasso regtests

f54428b

github-advanced-security bot found potential problems Jun 6, 2024

View reviewed changes

luigibonati added 9 commits June 6, 2024 10:36

fix return plot functions

55ca602

updated stateinterpreter tutorial

e71ff1f

updated lasso tutorials

91fdf8c

updated docs

0192845

fix doc and tutorials

2aec410

fix loops in lasso

b8bb9f7

updated documentation

343fea6

fix indentation in committor notebook

90fca93

fixed committor notebook

d117fab

luigibonati removed the request for review from pietronvll June 6, 2024 11:50

luigibonati and others added 8 commits June 6, 2024 15:09

long training for committor

9b7a217

Fixed tutorials

314e595

Merge branch 'interpretability' of https://github.com/luigibonati/mlc…

f712c10

…olvar into interpretability

Plot longer training results in preview

6ea925f

fix lasso tutorial

35c5c45

raise error in lasso regression if target is not scalar

15ef1d9

improved plot_features_distribution

5b26950

run sphinx to update doc

9b09f4e

github-advanced-security bot found potential problems Jun 11, 2024

View reviewed changes

remove utils

e3aec8f

fix sensitivity notebook

71ad599

luigibonati merged commit d4fb5d7 into main Jun 11, 2024
12 checks passed

luigibonati deleted the interpretability branch June 11, 2024 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interpretability module: sparse linear models via LASSO #120

Interpretability module: sparse linear models via LASSO #120

luigibonati commented Jan 30, 2024 •

edited

Loading

codecov bot commented Jan 30, 2024 •

edited

Loading

luigibonati commented Jun 11, 2024

Interpretability module: sparse linear models via LASSO #120

Interpretability module: sparse linear models via LASSO #120

Conversation

luigibonati commented Jan 30, 2024 • edited Loading

Description

Todos

Tutorials

Questions

Status

codecov bot commented Jan 30, 2024 • edited Loading

Codecov Report

luigibonati commented Jun 11, 2024

luigibonati commented Jan 30, 2024 •

edited

Loading

codecov bot commented Jan 30, 2024 •

edited

Loading